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Abstract 

There are two main methodologies for constructing the knowledge base 
of a natural language analyser: the linguistic and the data-driven. Re- 
cent state-of-the-art part-of-speech taggers are based on the data-driven 
approach. Because of the known feasibility of the linguistic rule-based 
approach at related levels of description, the success of the data-driven 
approach in part-of-speech analysis may appear surprising. In this paper 1 , 
a case is made for the syntactic nature of part-of-speech tagging. A new 
tagger of English that uses only linguistic distributional rules is outlined 
and empirically evaluated. Tested against a benchmark corpus of 38,000 
words of previously unseen text, this syntax-based system reaches an ac- 
curacy of above 99%. Compared to the 95-97% accuracy of its best com- 
petitors, this result suggests the feasibility of the linguistic approach also 
in part-of-speech analysis. 

1 Introduction 

Part-of-speech analysis usually consists of (i) introduction of ambiguity (lexi- 
cal analysis) and (ii) disambiguation (elimination of illegitimate alternatives). 
While introducing ambiguity is regarded as relatively straightforward, disam- 
biguation is known to be a difficult and controversial problem. There are two 
main methodologies: the linguistic and the data-driven. 

• In the linguistic approach, the generalisations are based on the linguist's 
(potentially corpus-based) abstractions about the paradigms and syn- 
tagms of the language. Distributional generalisations are manually coded 

1 This paper is published in Proceedings of the Seventh Conference of the European Chapter 
of the Association for Computational Linguistics, Dublin, 1995. 
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as a grammar, a system of constraint rules used for discarding contextu- 
ally illegitimate analyses. The linguistic approach is labour-intensive: skill 
and effort is needed for writing an exhaustive grammar. 

• In the data-driven approach, frequency-based information is automati- 
cally derived from corpora. The learning corpus can consist of plain text, 
but the best results seem achievable with annotated corpora (Merialdo 
1994; Elworthy 1994). This corpus-based information typically concerns 
sequences of 1-3 tags or words (with some well-known exceptions, e.g. 
Cutting et al. 1992). Corpus-based information can be represented e.g. as 
neural networks (Eineborg and Gamback 1994; Schmid 1994), local rules 
(Brill 1992), or collocational matrices (Garside 1987). In the data-driven 
approach, no human effort is needed for rule- writing. However, consider- 
able effort may be needed for determining a workable tag set (cf. Cutting 
1994) and annotating the training corpus. 

At the first flush, the linguistic approach may seem an obvious choice. A part- 
of-speech tagger's task is often illustrated with a noun-verb ambiguous word 
directly preceded by an unambiguous determiner (e.g. table in the table). This 
ambiguity can reliably be resolved with a simple and obvious grammar rule that 
disallows verbs after determiners. 

Indeed, few contest the fact that reliable linguistic rules can be written for 
resolving some part-of-speech ambiguities. The main problem with this approach 
seems to be that resolving part-of-speech ambiguities on a large scale, without 
introducing a considerable error margin, is very difficult at best. At least, no 
rule-based system with a convincing accuracy has been reported so far. 2 

As a rule, data-driven systems rely on statistical generalisations about short 
sequences of words or tags. Though these systems do not usually employ in- 
formation about long-distance phenomena or the linguist's abstraction capabil- 
ities (e.g. knowledge about what is relevant in the context), they tend to reach 
a 95-97% accuracy in the analysis of several languages, in particular English 
(Marshall 1983; Black et al. 1992; Church 1988; Cutting et al. 1992; de Marcken 
1990; DeRose 1988; Hindle 1989; Merialdo 1994; Weischedel et al. 1993; Brill 
1992; Samuelsson 1994; Eineborg and Gamback 1994, etc.). Interestingly, no sig- 
nificant improvement beyond the 97% "barrier" by means of purely data-driven 
systems has been reported so far. 

In terms of the accuracy of known systems, the data-driven approach seems then 
to provide the best model of part-of-speech distribution. This should appear a 

2 There is one potential exception: the rule-based morphological disambiguator used in the 
English Constraint Grammar Parser ENGCG (Voutilainen, Heikkila and Anttila 1992). Its 
recall is very high (99.7% of all words receive the correct morphological analysis), but this 
system leaves 3-7% of all words ambiguous, trading precision for recall. 
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little curious because very competitive results have been achieved using the lin- 
guistic approach at related levels of description. With respect to computational 
morphology, witness for instance the success of the Two-Level paradigm intro- 
duced by Koskenniemi (1983): extensive morphological descriptions have been 
made of more than 15 typologically different languages (Kimmo Koskenniemi, 
personal communication). With regard to computational syntax, see for instance 
(Giingordii and Oflazer 1994; Hindle 1983; Jensen, Heidorn and Richardson 
(eds.) 1993; McCord 1990; Sleator and Temperley 1991; Alshawi (ed.) 1992; 
Strzalkowski 1992). The present success of the statistical approach in part-of- 
speech analysis seems then to form an exception to the general feasibility of the 
rule-based linguistic approach. Is the level of parts of speech somehow different, 
perhaps less rule-governed, than related levels? 3 

We do not need to assume this idiosyncratic status entirely. The rest of this 
paper argues that also parts of speech can be viewed as a rule-governed phe- 
nomenon, possible to model using the linguistic approach. However, it will also 
be argued that though the distribution of parts of speech can to some extent 
be described with rules specific to this level of representation, a more natu- 
ral account could be given using rules overtly about the form and function of 
essentially syntactic categories. A syntactic grammar appears to predict the dis- 
tribution of parts of speech as a "side effect" . In this sense parts of speech seem 
to differ from morphology and syntax: their status as an independent level of 
linguistic description appears doubtful. 

Before proceeding further with the main argument, consider three very recent 
hybrids - systems that employ linguistic rules for resolving some of the ambigu- 
ities before using automatically generated corpus-based information: collocation 
matrices (Leech, Garside and Bryant 1994), Hidden Markov Models (Tapanai- 
nen and Voutilainen 1994), or syntactic patterns (Tapanainen and Jarvinen 
1994). What is interesting in these hybrids is that they, unlike purely data- 
driven taggers, seem capable of exceeding the 97% barrier: all three report an 
accuracy of about 98. 5%. 4 The success of these hybrids could be regarded as 
evidence for the syntactic aspects of parts of speech. 

However, the above hybrids still contain a data-driven component, i.e. it remains 
an open question whether a tagger entirely based on the linguistic approach can 
compare with a data-driven system. Next, a new system with the following 
properties is outlined and evaluated: 

• The tagger uses only linguistic distributional rules. 

• Tested against a 38,000-word corpus of previously unseen text, the tagger 
reaches a better accuracy than previous systems (over 99%). 

3 For related discussion, cf. Sampson (1987) and Church (1992). 

4 However, CLAWS4 (Leech, Garside and Bryant 1994) leaves some ambiguities unresolved; 
it uses portmanteau tags for representing them. 
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• At the level of linguistic abstraction, the grammar rules are essentially 
syntactic. Ideally, part-of-speech disambiguation should fall out as a "side 
effect" of syntactic analysis. 



Section 2 outlines a rule-based system consisting of the ENGCG tagger followed 
by a finite-state syntactic parser (Voutilainen and Tapanainen 1993; Voutilai- 
nen 1994) that resolves remaining part-of-speech ambiguities as a side effect. In 
Section 3, this rule-based system is tested against a 38,000-word corpus of pre- 
viously unseen text. Currently tagger evaluation is only becoming standardised; 
the evaluation method is accordingly reported in detail. 



2 System description 

The tagger consists of the following sequential components: 

• Tokeniser 

• ENGCG morphological analyser 

— Lexicon 

— Morphological heuristics 

• ENGCG morphological disambiguator 

• Lookup of alternative syntactic tags 

• Finite state syntactic disambiguator 



2.1 Morphological analysis 

The tokeniser is a rule-based system for identifying words, punctuation marks, 
document markers, and fixed syntagms (multiword prepositions, certain com- 
pounds etc.). 

The morphological description consists of two rule components: (i) the lexicon 
and (ii) heuristic rules for analysing unrecognised words. 

The English Koskenniemi-style lexicon contains over 80,000 lexical entries, each 
of which represents all inflected and some derived surface forms. The lexicon 
employs 139 tags mainly for part of speech, inflection and derivation; for in- 
stance: 



"<that>" 

"that" <**CLB> CS 

"that" DET CEITRAL DEM SG 

"that" ADV 

"that" PROI DEM SG 

"that" <Rel> PROI SG/PL 
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The morphological analyser produces about 180 different tag combinations. To 
contrast the ENGCG morphological description with the well-known Brown 
Corpus tags: ENGCG is more distinctive in that a part-of-speech distinction 
is spelled out in the description of (i) determiner-pronoun, (ii) preposition- 
conjunction, (iii) determiner-adverb-pronoun, and (iv) subjunctive-imperat- 
ive-infinitive-present tense homographs. On the other hand, ENGCG does not 
spell out part-of-speech ambiguity in the description of (i) -ing and nonfinite 
-ed forms, (ii) noun-adjective homographs with similar core meanings, or (iii) 
abbreviation-proper noun-common noun homographs. 

"Morphological heuristics" is a rule-based module for the analysis of those 1- 
5% of input words not represented in the lexicon. This module employs ordered 
hand-grafted rules that base their analyses on word shape. If none of the pattern 
rules apply, a nominal reading is assigned as a default. 

2.2 ENGCG disambiguator 

A Constraint Grammar can be viewed as a collection 5 of pattern-action rules, 
no more than one for each ambiguity-forming tag. Each rule specifies one or 
more context patterns, or "constraints", where the tag is illegitimate. If any of 
these context patterns are satisfied during disambiguation, the tag is deleted; 
otherwise it is left intact. The context patterns can be local or global, and they 
can refer to ambiguous or unambiguous analyses. During disambiguation, the 
context can become less ambiguous. To help a pattern defining an unambiguous 
context match, several passes are made over the sentence during disambiguation. 

The current English grammar contains 1,185 linguistic constraints on the linear 
order of morphological tags. Of these, 844 specify a context that extends beyond 
the neighboring word; in this limited sense, 71% of the constraints are global. 
Interestingly, the constraints are partial and often negative paraphrases of 23 
general, essentially syntactic generalisations about the form of the noun phrase, 
the prepositional phrase, the finite verb chain etc. (Voutilainen 1994). 

The grammar avoids risky predictions, therefore 3-7% of all words remain am- 
biguous (an average 1.04-1.08 alternative analyses per output word). On the 
other hand, at least 99.7% of all words retain the correct morphological analysis. 
Note in passing that the ratio 1.04-1.08/99.7% compares very favourably with 
other systems; c.f. 3.0/99.3% by POST (Weischedel et al. 1993) and 1.04/97.6% 
or 1.09/98.6% by de Marcken (1990). 

There is an additional collection of 200 optionally applicable heuristic con- 
straints that are based on simplified linguistic generalisations. They resolve 

5 Actually, it is possible to define additional heuristic rule collections that can optionally 
be applied after the more reliable ones for resolving remaining ambiguities. 
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about half of the remaining ambiguities, increasing the overall error rate to 
about 0.5%. 

Most of even the remaining ambiguities are structurally resolvable. ENGCG 
leaves them pending mainly because it is prohibitively difficult to express cer- 
tain kinds of structural generalisation using the available rule formalism and 
grammatical representation. 

2.3 Syntactic analysis 

2.3.1 Finite-State Intersection Grammar 

Syntactic analysis is carried out in another reductionistic parsing framework 
known as Finite-State Intersection Grammar (Koskenniemi 1990; Koskenniemi, 
Tapanainen and Voutilainen 1992; Tapanainen 1992; Voutilainen and Tapanai- 
nen 1993; Voutilainen 1994). A short introduction: 

• Also here syntactic analysis means resolution of structural ambiguities. 
Morphological, syntactic and clause boundary descriptors are introduced 
as ambiguities with simple mappings; these ambiguities are then resolved 
in parallel. 

• The formalism does not distinguish between various types of ambiguity; 
nor are ambiguity class specific rule sets needed. A single rule often resolves 
all types of ambiguity, though superficially it may look e.g. like a rule about 
syntactic functions. 

• The grammarian can define constants and predicates using regular expres- 
sions. For instance, the constants "." and accept any features within a 
morphological reading and a finite clause (that may even contain centre- 
embedded clauses), respectively. Constants and predicates can be used in 
rules, e.g. implication rules that are of the form 

x => 

LCI _ RCl, 
LC2 _ RC2, 

LCn _ RCn; 

Here X, LCI, RCl, LC2 etc. are regular expressions. The rule reads: "X 
is legitimate only if it occurs in context LCI __ RCl or in context LC2 __ 
RC2 ... or in context LCn __ RCn". 

• Also the ambiguous sentences are represented as regular expressions. 
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• Before parsing, rules and sentences are compiled into deterministic finite- 
state automata. 

• Parsing means intersecting the (ambiguous) sentence automaton with each 
rule automaton. Those sentence readings accepted by all rule automata 
are proposed as parses. 

• In addition, heuristic rules can be used for ranking alternative analyses 
accepted by the strict rules. 



2.3.2 Grammatical representation 



The grammatical representation used in the Finite State framework is an exten- 
sion of the ENGCG syntax. Surface-syntactic grammatical relations are encoded 
with dependency-oriented functional tags. Functional representation of phrases 
and clauses has been introduced to facilitate expressing syntactic generalisa- 
tions. The representation is introduced in (Voutilainen and Tapanainen 1993; 
Voutilainen 1994); here, only the main characteristics are given: 

• Each word boundary is explicitly represented as one of five alternatives: 

— the sentence boundary "@@" 

— the boundary separating juxtaposed finite clauses "@/" 

— centre-embedded (sequences of) finite clauses are flanked with "@<" 
and "@>" 

— the plain word boundary "@" 

• Each word is furnished with a tag indicating a surface-syntactic function 
(subject, premodifier, auxiliary, main verb, adverbial, etc.). All main verbs 
are furnished with two syntactic tags, one indicating its main verb status, 
the other indicating the function of the clause. 

• An explicit difference is made between finite and nonfinite clauses. Mem- 
bers in nonfinite clauses are indicated with lower case tags; the rest with 
upper case. 

• In addition to syntactic tags, also morphological, e.g. part-of-speech tags 
are provided for each word. Let us illustrate with a simplified example. 



Mary 


I 


@SUBJ 


told 


V 


@MV 


the 


DET 


@>I 


fat 


A 


@>I 


butcher J s 


I 


@>I 
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wife I 

and CC 

daughters I 

that CS 

she PROI 

remembers V 

seeing V 

a DET 

dream I 

last DET 

night I 



Of ullstop 



OIOBJ @ 

@CC @ 

OIOBJ 0/ 

@CS @ 

@SUBJ @ 

@MV 0BJ@ @ 

@mv 0BJ@ @ 

@>I @ 

@obj @ 

@>I @ 

@ADVL @ 
@@ 



Here Mary is a subject in a finite clause (hence the upper case); told is a 
main verb in a main clause; the, fat and butcher's are premodifiers; wife 
and daughters are indirect objects; that is a subordinating conjunction; 
remembers is a main verb in a finite clause that serves the Object role in 
a finite clause (the regent being told); seeing is a main verb in a nonfinite 
clause (hence the lower case) that also serves the Object role in a finite 
clause; dream is an object in a nonfinite clause; night is an adverbial. 
Because only boundaries separating finite clauses are indicated, there is 
only one sentence-internal clause boundary, "@/" between daughters and 
that. 



This kind of representation seeks to be (i) sufficiently expressive for stating 
grammatical generalisations in an economical and transparent fashion and (ii) 
sufficiently underspecific to make for a structurally resolvable grammatical rep- 
resentation. For example, the present way of functionally accounting for clauses 
enables the grammarian to express rules about the coordination of formally dif- 
ferent but functionally similar entities. Regarding the resolvability requirement, 
certain kinds of structurally unresolvable distinctions are never introduced. For 
instance, the premodifier tag @>#only indicates that its head is a nominal in 
the right hand context. 

2.3.3 A sample rule 

Here is a realistic implication rule that partially defines the form of prepositional 
phrases: 



PassVChain . . 
PostModiCl. . 
WH-Question. . 



PREP => 

_ . @ Coord, 
_ . . PrepComp , 

<Deferred> . _, 

<Deferred> . _, 

<Deferred> . 
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A preposition is followed by a coordination or a preposition complement (here 
hidden in the constant ..PrepComp that accepts e.g. noun phrases, nonfinite 
clauses and nominal clauses), or it (as a 'deferred' preposition) is preceded by 
a passive verb chain PassVChain.. or a postmodifying clause PostModiCL. (the 
main verb in a postmodifying clause is furnished with the postmodifier tag 
N<@) or of a WH-question (i.e. in the same clause, there is a WH-word). If 
the tag PREP occurs in none of the specified contexts, the sentence reading 
containing it is discarded. 

A comprehensive parsing grammar is under development. Currently it accounts 
for all major syntactic structures of English, but in a somewhat underspecific 
fashion. Though the accuracy of the grammar at the level of syntactic analysis 
can still be considerably improved, the syntactic grammar is already capable of 
resolving morphological ambiguities left pending by ENGCG. 

3 An experiment with part-of-speech disam- 
biguation 

The system was tested against a 38,202-word test corpus consisting of previously 
unseen journalistic, scientific and manual texts. 

The finite-state parser, the last module in the system, can in principle be 
"forced" to produce an unambiguous analysis for each input sentence, even for 
ungrammatical ones. In practice, the present implementation sometimes fails to 
give an analysis to heavily ambiguous inputs, regardless of their grammaticality. 6 
Therefore two kinds of output were accepted for the evaluation: (i) the un- 
ambiguous analyses actually proposed by the finite-state parser, and (ii) the 
ENGCG analysis of those sentences for which the finite-state parser gave no 
analyses. From this nearly unambiguous combined output, the success of the 
hybrid was measured, by automatically comparing it with a benchmark ver- 
sion of the test corpus at the level of morphological (including part-of-speech) 
analysis (i.e. the syntax tags were ignored). 

3.1 Creation of benchmark corpus 

The benchmark corpus was created by first applying the preprocessor and mor- 
phological analyser to the test text. This morphologically analysed ambiguous 
text was then independently disambiguated by two experts whose task also was 
to detect any errors potentially produced by the previously applied components. 
They worked independently, consulting written documentation of the grammati- 
cal representation when necessary. Then these manually disambiguated versions 

6 During the intersection, the sentence automaton sometimes becomes prohibitively large. 
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ambiguous words 


readings 


readings/ word 


errors 


error rate 


DO (Morph. analysis) 


39.0% 


67,737 


1.77 


31 


0.08% 


Dl (DO + ENGCG) 


6.2% 


40,450 


1.06 


124 


0.32% 


D2 (Dl + ENGCG heur.) 


3.2% 


38,949 


1.02 


226 


0.59% 


D3 (D2 + FS parser) 


0.6% 


38,342 


1.00 


281 


0.74% 



Figure 1: Results from a tagging test on a 38,202-word corpus. 



were automatically compared. At this stage, slightly over 99% of all analyses 
were identical. When the differences were collectively examined, it was agreed 
that virtually all were due to inattention. 7 One of these two corpus versions was 
modified to represent the consensus, and this 'consensus corpus' was used as the 
benchmark in the evaluation. 8 

3.2 Results 

The results are given in Figure 1. 

Let us examine the results. ENGCG accuracy was close to normal, except that 
the heuristic constraints (tagger D2) performed somewhat poorer than usual. 

The finite-state parser gave an analysis to about 80% of all words. Overall, 0.6% 
of all words remained ambiguous (due to the failure of the Finite State parser; 
c.f. Section 3). Parsing speed varied greatly (0.1-150 words/sec.) - refinement 
of the Finite State software is still underway. 

The overall success of the system is very encouraging - 99.26% of all words 
retained the correct morphological analysis. Compared to the 95-97% accu- 
racy of the best competing probabilistic part-of-speech taggers, this accuracy, 
achieved with an entirely rule-based description, suggests that part-of-speech 
disambiguation is a syntactic problem. 

The misanalyses have not been studied in detail, but some general observations 
can be made: 

• Many misanalyses made by the Finite State parser were due to ENGCG 
misanalyses (the "domino effect"). 

• The choice between adverbs and other categories was sometimes difficult. 
The distributions of adverbs and certain other categories overlaps; this 
may explain this error type. Lexeme-oriented constraints could be formu- 
lated for some of these cases. 

7 Only in the analysis of a few headings, different (meaning-level) interpretations arose, and 
even here it was agreed by both judges that this ambiguity was genuine. 

8 If this high consensus level appears surprising, see Voutilainen and Jarvinen (this volume) . 
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• Some ambiguities, e.g. noun-verb and participle-past tense, were prob- 
lematic. This is probably due to the fact that while the parsing grammar 
always requires a regent for a dependent, it is much more permissive on 
dependentless regents. Clause boundaries, and hence the internal structure 
of clauses, could probably be determined more accurately if the heuristic 
part of the grammar also contained rules for preferring e.g. verbs with 
typical complements over verbs without complements. 

4 Conclusion 

Part-of-speech disambiguation has recently been tackled best with data-driven 
techniques. Linguistic techniques have done well at related levels (morphology, 
syntax) but not here. Is there something in parts of speech that makes them 
less accessible to the rule-based linguistic approach? 

This paper outlines and evaluates a new part-of-speech tagger. It uses only 
linguistic distributional rules, yet reaches an accuracy clearly better than any 
competing system. This suggests that also parts of speech are a rule-governed 
distributional phenomenon. 

The tagger has two rule components. One is a grammar specifically developed 
for resolution of part-of-speech ambiguities. Though much effort was given to 
its development, it leaves many ambiguities unresolved. These rules, superfi- 
cially about parts of speech, actually express essentially syntactic generalisa- 
tions, though indirectly and partially. The other rule component is a syntactic 
grammar. This syntactic grammar is able to resolve the pending part-of-speech 
ambiguities as a side effect. 

In short: like morphology and syntax, parts of speech seem to be a rule-governed 
phenomenon. However, the best distributional account of parts of speech appears 
achievable by means of a syntactic grammar. 9 
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ENGCG-style corpus tools, e.g. NPtool (Voutilainen 1993). 
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Appendix 



Enclosed is a sample output of the system. Syntax tags have been retained; base 
forms and some tags have been removed for better readability. The syntactic 
tags used here are the following: 

• @>A premodifier of adjective, adverb or quantifier, 

• @>N noun premodifier, 

• @N< noun postmodifier, 

• @ADVL adverbial, 

• @ADVL/N< adverbial or noun postmodifier, 

• @OBJ object in a finite clause, 

• @IOBJ indirect object in a finite clause, 

• @SUBJ subject in a finite clause, 

• @obj object in a nonfinite clause, 

• @P<< preposition complement, 

• @nh nominal head, 

• @CC coordinating conjunction, 

• @CS subordinating conjunction, 

• @MV main verb in a finite clause, 

• @aux auxiliary in a nonfinite clause, 

• @mv main verb in a nonfinite clause, 

• ADVL@ adverbial clause, 

• MC@ finite main clause, 

• OBJ@ clause as an object in a finite clause. 

@@ On PREP @ADVL @ 
completion N NOM SG @P<< @ 
©comma @ 

check V IMP @MV MC@ @ 

the DET CENTRAL SG/PL @>N @ 

engine N NOM SG @>N @ 

oil N NOM SG @>N @ 

level N NOM SG @OBJ @/ 

©comma @ 

start V IMP @MV MC@ @ 

the DET CENTRAL SG/PL @>N @ 

engine N NOM SG @OBJ @/ 

then ADV ADVL @ADVL @ 

check V IMP @MV MC@ @ 

for PREP @ADVL @ 

oil N NOM SG @>N @ 
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leaks N NOM PL SP<< S 
Sfullstop SS 

SS Screw V IMP SMV MCS S 

a DET CENTRAL SG S>N ® 

self-tapping PCP1 S>N ® 

screw N NOM SG SOBJ ® 

of PREP SN< ® 

appropriate A ABS S>N @ 

diameter N NOM SG SP<< ® 

into PREP SADVL/N< ® 

this DET CENTRAL DEM SG S>N ® 

hole N NOM SG SP<< ®/ 

©comma @ 

then ADV ADVL SADVL ® 

lever V IMP SMV MCS ® 

against PREP SADVL ® 

the DET CENTRAL SG/PL S>N ® 

screw N NOM SG SP<< ® 

to INFMARK> iaux ® 

extract V INF Smv ADVL® ® 

the DET CENTRAL SG/PL S>N ® 

plug N NOM SG Sobj ® 

as CS @CS ® 

shown PCP2 @mv ADVL® ® 
in PREP @ADVL ® 
FIG ABBR NOM SG @>N ® 
1.26 NUM CARD @P<< ® 
Ofullstop @@ 

@@ This PRON DEM SG @nh ® 
done PCP2 @N< ® 
©comma @ 

push V IMP SMV MCS ® 

the DET CENTRAL SG/PL @>N ® 

crankshaft N NOM SG @OBJ ® 

fully ADV @>A ® 

rearwards ADV SADVL ®/ 

©comma @ 

then ADV ADVL SADVL ® 
slowly ADV SADVL ® 
but CC @CC ® 
positively ADV SADVL ® 
push V IMP SMV MCS S 



it PRON ACC SG3 @OBJ ® 
forwards ADV ADVL OADVL ® 
to PREP OADVL ® 
its PRON GEN SG3 @>N ® 
stop N NOM SG @P<< ® 
Ofullstop @@ 

@@ Lightly ADV OADVL ® 

moisten V IMP SMV MCS ® 

the DET CENTRAL SG/PL @>N ® 

lips N NOM PL @OBJ ® 

of PREP @N< ® 

a DET CENTRAL SG @>N ® 

new A ABS @>N ® 

rear N NOM SG @>N ® 

oil N NOM SG @>N ® 

seal N NOM SG @P<< ® 

with PREP @ADVL/N< ® 

engine N NOM SG @>N ® 

oil N NOM SG @P<< ®/ 

©comma @ 

then ADV ADVL OADVL ® 
drive V IMP SMV MCS ® 
it PRON ACC SG3 @OBJ ® 
squarely ADV OADVL ® 
into PREP OADVL ® 
position N NOM SG @P<< ®/ 
until CS @CS ® 

it PRON NOM SG3 SUBJ OSUBJ ® 
rests V PRES SG3 SMV ADVL® ® 
against PREP OADVL ® 
its PRON GEN SG3 @>N ® 
abutment N NOM SG @P<< ® 
©comma @ 

preferably ADV OADVL ® 

using PCP1 @mv ADVL® ® 

the DET CENTRAL SG/PL @>N ® 

appropriate A ABS @>N ® 

service N NOM SG @>N ® 

tool N NOM SG @obj ® 

for PREP @ADVL/N< ® 

this DET CENTRAL DEM SG @>N ® 

operation N NOM SG @P<< ® 

Ofullstop @@ 



